-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Map and Arrays of Maps in BQ for StorageWrites for Beam Rows #22179
Support Map and Arrays of Maps in BQ for StorageWrites for Beam Rows #22179
Conversation
Run Java PreCommit |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
fixes #23618 |
.remove-labels stale |
Can you rebase and resolve, since that file has since been refactored? |
done |
@@ -229,6 +243,8 @@ private static Object messageValueFromRowValue( | |||
if (value == null) { | |||
if (fieldDescriptor.isOptional()) { | |||
return null; | |||
} else if (fieldDescriptor.isRepeated()) { | |||
return Lists.newArrayList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collections.emptyList()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
arrayElementType.getTypeName().isCollectionType() | ||
|| arrayElementType.getTypeName().isMapType() | ||
? true | ||
: false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove redundant ternary operator. Also no need for Boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (what was I thinking)
|
||
if (shouldFlatMap) { | ||
valueStream = valueStream.flatMap(vs -> ((List) vs).stream()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain why flatMap is correct here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because BQ does not support arrays of maps, this code is making the decision of flattening those structures (the if condition is computed based on that particular scenario).
[
map1 [k1,v2] [k2,v2] ,
map2 [k3,v3],
map3 [k4,v4] [k5,v5],
]
------------------------- to
[
record {key:k1, value:v1},
record {key:k2, value:v2},
record {key:k3, value:v3},
record {key:k4, value:v4},
record {key:k5, value:v5}
]
It respects the order in the array and the inherent order of iteration in the maps, but it won't check for repeated keys across the maps in the original array.
FieldType keyFieldType, | ||
FieldType valueFieldType, | ||
Map.Entry<Object, Object> entryValue) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
FieldDescriptor keyFieldDescriptor = | ||
Preconditions.checkNotNull(descriptor.findFieldByName("key")); | ||
@Nullable Object key = toProtoValue(keyFieldDescriptor, keyFieldType, entryValue.getKey()); | ||
if (key != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are null keys allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, AFAICT the backing Map in the Row object is a HashMap (with expected size) so I would say that null keys are allowed in a map property for a Row object.
This code only ignores setting the value in the proto if null was the value present on the map's key.
Run Java PreCommit |
Run Java_GCP_IO_Direct PreCommit |
Is there a way to rerun
Not related with this changes, but not sure how to ensure this works without pushing more commits. |
Has this been tested e2e (with BigQuery)? |
Run Java_GCP_IO_Direct PreCommit |
1 similar comment
Run Java_GCP_IO_Direct PreCommit |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Could we merge the PR? This is necessary for us to ingest data to BigQuery. |
This PR has been closed for a while, I can take some time to migrate the
changes to a new branch from master.
…On Thu, Sep 19, 2024 at 6:03 PM zz ***@***.***> wrote:
Could we merge the PR? This is necessary for us to ingest data to BigQuery.
—
Reply to this email directly, view it on GitHub
<#22179 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2HMFYXRBLP2I6TEUYYPJTZXNX67AVCNFSM6AAAAABORALTZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRSGUYDSMJTGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Thanks a lot Pablo! |
Currently BigQuery table schema utility, and the implementation for StorageWrites for Beam Rows, does not support sending records with Maps as part of their schema.
This PR adds that functionality transforming the Map into a Message type which contains two fields
key
andvalue
respecting the types coming from upstream while mimicking the behavior when using TableRows to the BigQueryIO PTransform.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.